High Performance Rearrangement and Multiplication Routines for Sparse Tensor Arithmetic
نویسندگان
چکیده
Researchers from diverse disciplines are increasingly incorporating numeric highorder data, i.e., numeric tensors, within their practice. Just like the matrix-vector (MV) paradigm, the development of multi-purpose, but high-performance, sparse data structures and algorithms for arithmetic calculations, e.g., those found in Einstein-like notation, is crucial for the continued adoption of tensors. We use the motivating example of high-order differential operators to illustrate this need. As sparse tensor arithmetic represents an emerging research topic, with challenges distinct from the MV paradigm, many aspects require further articulation and development. This work focuses on three core facets. First, aligning with prominent voices in the field, we emphasise the importance of data structures able to accommodate the operational complexity of tensor arithmetic. However, we describe a linearised coordinate (LCO) data structure that provides faster and more memory-efficient sorting performance in support of this operational complexity. Second, flexible data structures, like the LCO, rely heavily on sorts and permutations. We introduce an innovative permutation algorithm, based on radix sort, that is tailored to rearrange already-sorted sparse data, producing significant performance gains. Third, we introduce a novel poly-algorithm for sparse tensor products, where hyper-sparsity is a possibility. Different manifestations of hyper-sparsity demand their own customised approach, which our multiplication poly-algorithm is the first to provide. These developments are incorporated within our LibNT and NTToolbox software libraries. Benchmarks, frequently drawn from the high-order differential operators example, demonstrate the practical impact of our routines, with speed-ups of 40% or higher compared to alternative high-performance implementations. Comparisons against the MATLAB Tensor Toolbox show over 10 times speed improvements. Thus, these advancements produce significant practical improvements for sparse tensor arithmetic.
منابع مشابه
Hardware Acceleration Technologies in Computer Algebra : Challenges
The objective of high performance computing (HPC) is to ensure that the computational power of hardware resources is well utilized to solve a problem. Various techniques are usually employed to achieve this goal. Improvement of algorithm to reduce the number of arithmetic operations, modifications in accessing data or rearrangement of data in order to reduce memory traffic, code optimization at...
متن کاملAnalyzing the Performance of a Sparse Matrix Vector Multiply for Extreme Scale Computers
As high-performance computing systems continue to progress towards extreme scale, the scalability of applications becomes critical. The scalability of an algorithm is dependent on interconnect properties, such as latency and bandwidth, and is often limited by network contention. Sparse matrixvector multiplication (SpMV) is fundamental to a large class of HPC applications. We investigate the per...
متن کاملSubdivision Surface Evaluation as Sparse Matrix-Vector Multiplication
We present an interpretation of subdivision surface evaluation in the language of linear algebra. Specifically, the vector of surface points can be computed by left-multiplying the vector of control points by a sparse subdivision matrix. This “matrix-driven” interpretation applies to any level of subdivision, holds for many common subdivision schemes (including Catmull-Clark and Loop), supports...
متن کاملضربکننده و ضربجمعکننده پیمانه 2n+1 برای پردازنده سیگنال دیجیتال
Nowadays, digital signal processors (DSPs) are appropriate choices for real-time image and video processing in embedded multimedia applications not only due to their superior signal processing performance, but also of the high levels of integration and very low-power consumption. Filtering which consists of multiple addition and multiplication operations, is one of the most fundamental operatio...
متن کاملA Portable High Performance Multiprecision Package
The author has written a package of Fortran routines that perform a variety of arithmetic operations and transcendental functions on floating point numbers of arbitrarily high precision, including large integers. This package features (1) virtually universal portability, (2) high performance, especially on vector supercomputers, (3) advanced algorithms, including FFT-based multiplication and qu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.02619 شماره
صفحات -
تاریخ انتشار 2018